Dataviz with R

ggplot2 & the grammar of graphics
May 14, 2014

Tony Fujs

Dataviz tools?

WHY USE R & ggplot2?

  • Flexible
  • Powerful

WHY USE R & ggplot2?

WHY USE R & ggplot2?

WHY USE R & ggplot2?

WHY USE R & ggplot2?

WHY USE R & ggplot2?

WHY USE R & ggplot2?

WHY USE R & ggplot2?

WHY USE R & ggplot2?

  • Flexible
  • Powerful
  • Scaling

Difference between this plot...

and this plot?

ONE LINE OF CODE!!

facet_wrap(~year)

WHY USE R & ggplot2?

  • Flexible
  • Powerful
  • Scaling
  • Reproducible work

WHY USE R & ggplot2?

  • Flexible
  • Powerful
  • Scaling
  • Reproducible work
  • Building block for other tools (lyra, ggvis, SPSS)

What is ggplot2?

  • R package (Tool in the R toolbox)
  • Rely on the Grammar of Graphics (gg)

Barriers to entry

  • R: From point & click to writing code
  • Learning Grammar of Graphics (gg) - as opposed to typology

Objective of the workshop

Remove those barriers

  • Understand the gg framework
  • Play with simple code

How do we do it?

  1. gg theory

How do we do it?

  1. gg theory
  2. create simple plots

How do we do it?

  1. gg theory
  2. create simple plots
  3. create complex plot(s)

Napoleon's Russian Campaign: Original

Napoleon's Russian Campaign: ggplot2

Small multiples: Walmart stores

PRACTICE TIME!!

DRAW A SCATTER PLOT OF THE FOLLOWING DATASET

PRACTICE TIME!!

DESCRIBE THE STEPS YOU TOOK TO DRAW THE PLOT

Scatter plots: STEP 1

Scatter plots: STEP 2

Scatter plots: STEP 3

Scatter plots: STEP 4

Scatter plots: STEP 5

Grammar of graphic summary

DATA

Data

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
  • Code
data = mini_walmart

AESTHETIC MAPPING

Aesthetics: position

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = state,y = stores)
data = mini_walmart

Aesthetics: position

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = state,y = stores)
data = mini_walmart

plot of chunk unnamed-chunk-7

Aesthetics: color

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = state,y = stores,
    color = stores)
data = mini_walmart

plot of chunk unnamed-chunk-10

Aesthetics: color

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = state,y = stores,
    color = state)
data = mini_walmart

plot of chunk unnamed-chunk-13

Aesthetics: shape

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = state,y = stores,
    shape = state)
data = mini_walmart

plot of chunk unnamed-chunk-16

PRACTICE TIME!!

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
#COMPLETE THE CODE TO PRODUCE THIS PLOT
aes(x = state,y = stores)
data = mini_walmart

plot of chunk unnamed-chunk-19

PRACTICE TIME!!

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
#COMPLETE THE CODE TO PRODUCE THIS PLOT
aes(x = state,y = stores,
    color = state,
    shape = state)
data = mini_walmart

plot of chunk unnamed-chunk-22

PRACTICE TIME!!

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
# WHAT ADDITIONAL AESTHETIC MAPPING IS NEEDED TO PRODUCE THIS PLOT?
aes(x = state,y = stores)
data = mini_walmart

plot of chunk unnamed-chunk-25

PRACTICE TIME!!

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
# WHAT ADDITIONAL AESTHETIC MAPPING IS NEEDED TO PRODUCE THIS PLOT?
aes(x = state,y = stores,
    size = stores)
data = mini_walmart

plot of chunk unnamed-chunk-28

SCALE

Scale: position (default)

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = state,y = stores)
data = mini_walmart

plot of chunk unnamed-chunk-31

Scale: position (default)

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = state,y = stores)
data = mini_walmart 
scale_y_continuous()

plot of chunk unnamed-chunk-34

Scale: position (log)

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = state,y = stores)
data = mini_walmart 
scale_y_log10()

plot of chunk unnamed-chunk-37

Scale: color

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = state,y = stores,
    color = stores)
data = mini_walmart 

plot of chunk unnamed-chunk-40

Scale: color

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = state,y = stores,
    color = stores)
data = mini_walmart
scale_color_continuous()

plot of chunk unnamed-chunk-43

Scale: color

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = state,y = stores,
    color = stores)
data = mini_walmart
scale_color_continuous(
  low = 'light green',
  high = 'dark green')

plot of chunk unnamed-chunk-46

Geometric objects: point

Geometric objects: point

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = state,y = stores)
data = mini_walmart

plot of chunk unnamed-chunk-49

Geometric objects: point

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = state,y = stores)
data = mini_walmart
geom_point()

plot of chunk unnamed-chunk-52

Geometric objects: bar

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = state,y = stores)
data = mini_walmart
geom_bar()

TAKE A GUESS: WHAT WILL THIS PLOT LOOK LIKE?

Geometric objects: bar

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = state,y = stores)
data = mini_walmart
geom_point()

plot of chunk unnamed-chunk-57

PRACTICE TIME!!

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
# COMPLETE THE CODE TO PRODUCE THIS PLOT
aes(x = state,y = stores)
data = mini_walmart

plot of chunk unnamed-chunk-60

PRACTICE TIME!!

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
# COMPLETE THE CODE TO PRODUCE THIS PLOT
aes(x = state,y = stores)
data = mini_walmart
geom_line()

plot of chunk unnamed-chunk-63

Geometric objects: text

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = state,y = stores)
data = mini_walmart
geom_text()

TAKE A GUESS: WHAT WILL THIS PLOT LOOK LIKE?

Geometric objects: text

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = state,y = stores)
data = mini_walmart
geom_text()

Error: geom_text requires the following missing aesthetics: label

Geometric objects: text

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = state,y = stores,
    label = stores)
data = mini_walmart
geom_text()

plot of chunk unnamed-chunk-70

Position adjustment: identity

Position adjustment: identity

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = year,y = stores)
data = mini_walmart
geom_point()

plot of chunk unnamed-chunk-73

Position adjustment: identity

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = year,y = stores)
data = mini_walmart
geom_point()
position = 'identity'

plot of chunk unnamed-chunk-76

Position adjustment: identity

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = year,y = stores)
data = mini_walmart
geom_bar()
position = identity

TAKE A GUESS: WHAT WILL THIS PLOT LOOK LIKE?

Position adjustment: identity

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = year,y = stores)
data = mini_walmart
geom_bar()
position = identity

plot of chunk unnamed-chunk-81

Position adjustment: identity

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = year,y = stores,
    fill = state)
data = mini_walmart
geom_bar()
position = identity

plot of chunk unnamed-chunk-84

Position adjustment: identity

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = year,y = stores,
    fill = state)
data = mini_walmart
geom_bar()
position = identity

plot of chunk unnamed-chunk-87

Position adjustment: identity

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = year,y = stores,
    fill = state)
data = mini_walmart
geom_bar()
position = identity

plot of chunk unnamed-chunk-90

Position adjustment: identity

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = year,y = stores,
    fill = state)
data = mini_walmart
geom_bar()
position = identity

plot of chunk unnamed-chunk-93

Position adjustment: identity

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = year,y = stores,
    fill = state)
data = mini_walmart
geom_bar()
position = identity

plot of chunk unnamed-chunk-96

Position adjustment: identity

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = year,y = stores,
    fill = state)
data = mini_walmart
geom_bar()
position = identity

plot of chunk unnamed-chunk-99

Position adjustment: dodge

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = year,y = stores,
    fill = state)
data = mini_walmart
geom_bar()
position = dodge

plot of chunk unnamed-chunk-102

Position adjustment: stack

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = year,y = stores,
    fill = state)
data = mini_walmart
geom_bar()
position = stack

plot of chunk unnamed-chunk-105

Position adjustment: fill

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = year,y = stores,
    fill = state)
data = mini_walmart
geom_bar()
position = fill

plot of chunk unnamed-chunk-108

COORDINATE SYSTEM: cartesian

COORDINATE SYSTEM: cartesian

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = year,y = stores,
    fill = state)
data = mini_walmart
geom_bar()
position = fill

plot of chunk unnamed-chunk-111

COORDINATE SYSTEM: cartesian

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = year,y = stores,
    fill = state)
data = mini_walmart
geom_bar()
position = fill
coord_cartesian()

plot of chunk unnamed-chunk-114

COORDINATE SYSTEM: cartesian

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = year,y = stores,
    fill = state)
data = mini_walmart
geom_bar()
position = fill
coord_polar()

TAKE A GUESS: WHAT WILL THIS PLOT LOOK LIKE?

COORDINATE SYSTEM: cartesian

  year state stores
1 2005    FL    174
2 2005    MI     76
3 2005    NJ     41
4 2005    NV     22
5 2005    VT      4
aes(x = year,y = stores,
    fill = state)
data = mini_walmart
geom_bar()
position = fill
coord_polar()

plot of chunk unnamed-chunk-119

PRACTICE TIME!!

ADD A REGRESSION LINE TO YOUR HAND DRAWN SCATTER PLOT

STATISTICAL TRANSFORMATION

STATISTICAL TRANSFORMATION

STATISTICAL TRANSFORMATION

STATISTICAL TRANSFORMATION

STATISTICAL TRANSFORMATION: identity

  year state stores share
1 2005    FL    174  0.40
2 2005    MI     76  0.30
3 2005    NJ     41  0.15
4 2005    NV     22  0.10
5 2005    VT      4  0.05
aes(x = stores, y = share)
data = mini_walmart
geom_point()

plot of chunk unnamed-chunk-122

STATISTICAL TRANSFORMATION: identity

  year state stores share
1 2005    FL    174  0.40
2 2005    MI     76  0.30
3 2005    NJ     41  0.15
4 2005    NV     22  0.10
5 2005    VT      4  0.05
aes(x = stores, y = share)
data = mini_walmart
geom_point(stat = 'identity')

plot of chunk unnamed-chunk-125

STATISTICAL TRANSFORMATION: smooth

  year state stores share
1 2005    FL    174  0.40
2 2005    MI     76  0.30
3 2005    NJ     41  0.15
4 2005    NV     22  0.10
5 2005    VT      4  0.05
aes(x = stores, y = share)
data = mini_walmart
geom_point(stat = 'smooth')

plot of chunk unnamed-chunk-128

STATISTICAL TRANSFORMATION: smooth

  year state stores share
1 2005    FL    174  0.40
2 2005    MI     76  0.30
3 2005    NJ     41  0.15
4 2005    NV     22  0.10
5 2005    VT      4  0.05
aes(x = stores, y = share)
data = mini_walmart
geom_line(stat = 'smooth')

plot of chunk unnamed-chunk-131

LAYER

  year state stores share
1 2005    FL    174  0.40
2 2005    MI     76  0.30
3 2005    NJ     41  0.15
4 2005    NV     22  0.10
5 2005    VT      4  0.05
aes(x = stores, y = share)
data = mini_walmart
geom_point(stat = 'identity')

plot of chunk unnamed-chunk-134

LAYER

  year state stores share
1 2005    FL    174  0.40
2 2005    MI     76  0.30
3 2005    NJ     41  0.15
4 2005    NV     22  0.10
5 2005    VT      4  0.05
aes(x = stores, y = share)
data = mini_walmart
geom_point(stat = 'identity') +
geom_line(stat = 'smooth')

plot of chunk unnamed-chunk-137

LAYER